Overview

Dataset statistics

Number of variables18
Number of observations891
Missing cells1220
Missing cells (%)7.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory413.6 KiB
Average record size in memory475.3 B

Variable types

NUM10
CAT7
BOOL1

Reproduction

Analysis started2020-08-03 21:26:07.377885
Analysis finished2020-08-03 21:26:34.784501
Duration27.41 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Ticket has a high cardinality: 681 distinct values High cardinality
Cabin has a high cardinality: 147 distinct values High cardinality
AgeNotNull is highly correlated with Age and 3 other fieldsHigh correlation
Age is highly correlated with AgeNotNull and 3 other fieldsHigh correlation
AgeFillNa-1 is highly correlated with Age and 2 other fieldsHigh correlation
AgeFillNaSexMean is highly correlated with Age and 2 other fieldsHigh correlation
InverseAge is highly correlated with Age and 3 other fieldsHigh correlation
Age has 177 (19.9%) missing values Missing
Cabin has 687 (77.1%) missing values Missing
AgeNotNull has 177 (19.9%) missing values Missing
InverseAge has 177 (19.9%) missing values Missing
Ticket is uniformly distributed Uniform
Cabin is uniformly distributed Uniform
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 608 (68.2%) zeros Zeros
Parch has 678 (76.1%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros
Relatives has 537 (60.3%) zeros Zeros

Variables

PassengerId
Real number (ℝ≥0)

UNIQUE

Distinct count891
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446.0
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum1
5-th percentile45.5
Q1223.5
median446
Q3668.5
95-th percentile846.5
Maximum891
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5770265516
Kurtosis-1.2
Mean446
Median Absolute Deviation (MAD)223
Skewness0
Sum397386
Variance66231
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
89110.1%
 
29310.1%
 
30410.1%
 
30310.1%
 
30210.1%
 
30110.1%
 
30010.1%
 
29910.1%
 
29810.1%
 
29710.1%
 
29610.1%
 
29510.1%
 
29410.1%
 
29210.1%
 
30610.1%
 
29110.1%
 
29010.1%
 
28910.1%
 
28810.1%
 
28710.1%
 
28610.1%
 
28510.1%
 
28410.1%
 
28310.1%
 
28210.1%
 
Other values (866)86697.2%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
510.1%
 
610.1%
 
710.1%
 
810.1%
 
910.1%
 
1010.1%
 
ValueCountFrequency (%) 
89110.1%
 
89010.1%
 
88910.1%
 
88810.1%
 
88710.1%
 
88610.1%
 
88510.1%
 
88410.1%
 
88310.1%
 
88210.1%
 

Survived
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549
1
342
ValueCountFrequency (%) 
054961.6%
 
134238.4%
 

Pclass
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491
1
216
2
184
ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number891100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 

Most occurring scripts

ValueCountFrequency (%) 
Common891100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII891100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
349155.1%
 
121624.2%
 
218420.7%
 

Name
Categorical

UNIQUE

Distinct count891
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Ivanoff, Mr. Kanio
 
1
Bengtsson, Mr. John Viktor
 
1
Johnson, Mr. William Cahoone Jr
 
1
Madigan, Miss. Margaret "Maggie"
 
1
Andersson, Mr. Anders Johan
 
1
Other values (886)
886
ValueCountFrequency (%) 
Ivanoff, Mr. Kanio10.1%
 
Bengtsson, Mr. John Viktor10.1%
 
Johnson, Mr. William Cahoone Jr10.1%
 
Madigan, Miss. Margaret "Maggie"10.1%
 
Andersson, Mr. Anders Johan10.1%
 
Ryerson, Miss. Emily Borie10.1%
 
Davies, Mr. Alfred J10.1%
 
Harris, Mr. Henry Birkhardt10.1%
 
Flynn, Mr. John Irwin ("Irving")10.1%
 
Novel, Mr. Mansouer10.1%
 
Laroche, Miss. Simonne Marie Anne Andree10.1%
 
McCormack, Mr. Thomas Joseph10.1%
 
Hoyt, Mr. Frederick Maxfield10.1%
 
Foreman, Mr. Benjamin Laventall10.1%
 
Pasic, Mr. Jakob10.1%
 
Silven, Miss. Lyyli Karoliina10.1%
 
Davis, Miss. Mary10.1%
 
Dahl, Mr. Karl Edwart10.1%
 
Chibnall, Mrs. (Edith Martha Bowerman)10.1%
 
Penasco y Castellana, Mr. Victor de Satode10.1%
 
Keane, Miss. Nora A10.1%
 
Karaic, Mr. Milan10.1%
 
Touma, Mrs. Darwis (Hanne Youssef Razi)10.1%
 
Peduzzi, Mr. Joseph10.1%
 
Hansen, Mr. Henrik Juul10.1%
 
Other values (866)86697.2%
 

Length

Max length82
Median length25
Mean length26.96520763
Min length12

Overview of Unicode Properties

Unique unicode characters60
Unique unicode categories (?)7
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
273511.4%
 
r19588.1%
 
e17037.1%
 
a16576.9%
 
i13255.5%
 
n13045.4%
 
s12975.4%
 
M11284.7%
 
l10674.4%
 
o10084.2%
 
.8923.7%
 
,8913.7%
 
t6672.8%
 
h5172.2%
 
d4852.0%
 
m3861.6%
 
u3411.4%
 
c2841.2%
 
y2511.0%
 
A2501.0%
 
g2351.0%
 
k2190.9%
 
J2150.9%
 
H2030.8%
 
S1800.7%
 
Other values (35)282811.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1544664.3%
 
Uppercase Letter364515.2%
 
Space Separator273511.4%
 
Other Punctuation18997.9%
 
Open Punctuation1440.6%
 
Close Punctuation1440.6%
 
Dash Punctuation130.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M112830.9%
 
A2506.9%
 
J2155.9%
 
H2035.6%
 
S1804.9%
 
C1724.7%
 
E1664.6%
 
W1433.9%
 
B1403.8%
 
L1293.5%
 
G1143.1%
 
R1123.1%
 
P1103.0%
 
F1032.8%
 
D992.7%
 
T862.4%
 
K722.0%
 
N701.9%
 
O451.2%
 
V441.2%
 
I340.9%
 
Y130.4%
 
Z70.2%
 
U50.1%
 
Q50.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
r195812.7%
 
e170311.0%
 
a165710.7%
 
i13258.6%
 
n13048.4%
 
s12978.4%
 
l10676.9%
 
o10086.5%
 
t6674.3%
 
h5173.3%
 
d4853.1%
 
m3862.5%
 
u3412.2%
 
c2841.8%
 
y2511.6%
 
g2351.5%
 
k2191.4%
 
b1671.1%
 
f1591.0%
 
v1240.8%
 
w990.6%
 
p890.6%
 
z440.3%
 
j300.2%
 
x250.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.89247.0%
 
,89146.9%
 
"1065.6%
 
'90.5%
 
/10.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2735100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(144100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)144100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-13100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1909179.5%
 
Common493520.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
r195810.3%
 
e17038.9%
 
a16578.7%
 
i13256.9%
 
n13046.8%
 
s12976.8%
 
M11285.9%
 
l10675.6%
 
o10085.3%
 
t6673.5%
 
h5172.7%
 
d4852.5%
 
m3862.0%
 
u3411.8%
 
c2841.5%
 
y2511.3%
 
A2501.3%
 
g2351.2%
 
k2191.1%
 
J2151.1%
 
H2031.1%
 
S1800.9%
 
C1720.9%
 
b1670.9%
 
E1660.9%
 
Other values (26)190610.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
273555.4%
 
.89218.1%
 
,89118.1%
 
(1442.9%
 
)1442.9%
 
"1062.1%
 
-130.3%
 
'90.2%
 
/1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII24026100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
273511.4%
 
r19588.1%
 
e17037.1%
 
a16576.9%
 
i13255.5%
 
n13045.4%
 
s12975.4%
 
M11284.7%
 
l10674.4%
 
o10084.2%
 
.8923.7%
 
,8913.7%
 
t6672.8%
 
h5172.2%
 
d4852.0%
 
m3861.6%
 
u3411.4%
 
c2841.2%
 
y2511.0%
 
A2501.0%
 
g2351.0%
 
k2190.9%
 
J2150.9%
 
H2030.8%
 
S1800.7%
 
Other values (35)282811.8%
 

Sex
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577
female
314
ValueCountFrequency (%) 
male57764.8%
 
female31435.2%
 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e120528.7%
 
m89121.3%
 
a89121.3%
 
l89121.3%
 
f3147.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter4192100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e120528.7%
 
m89121.3%
 
a89121.3%
 
l89121.3%
 
f3147.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin4192100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e120528.7%
 
m89121.3%
 
a89121.3%
 
l89121.3%
 
f3147.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII4192100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e120528.7%
 
m89121.3%
 
a89121.3%
 
l89121.3%
 
f3147.5%
 

Age
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count88
Unique (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911764705882
Minimum0.42
Maximum80.0
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24303.4%
 
22273.0%
 
18262.9%
 
28252.8%
 
19252.8%
 
30252.8%
 
21242.7%
 
25232.6%
 
36222.5%
 
29202.2%
 
32182.0%
 
26182.0%
 
35182.0%
 
27182.0%
 
16171.9%
 
31171.9%
 
34151.7%
 
23151.7%
 
33151.7%
 
20151.7%
 
39141.6%
 
17131.5%
 
42131.5%
 
40131.5%
 
45121.3%
 
Other values (63)23626.5%
 
(Missing)17719.9%
 
ValueCountFrequency (%) 
0.4210.1%
 
0.6710.1%
 
0.7520.2%
 
0.8320.2%
 
0.9210.1%
 
170.8%
 
2101.1%
 
360.7%
 
4101.1%
 
540.4%
 
ValueCountFrequency (%) 
8010.1%
 
7410.1%
 
7120.2%
 
70.510.1%
 
7020.2%
 
6610.1%
 
6530.3%
 
6420.2%
 
6320.2%
 
6240.4%
 

SibSp
Real number (ℝ≥0)

ZEROS

Distinct count7
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563411896
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
060868.2%
 
120923.5%
 
2283.1%
 
4182.0%
 
3161.8%
 
870.8%
 
550.6%
 
ValueCountFrequency (%) 
060868.2%
 
120923.5%
 
2283.1%
 
3161.8%
 
4182.0%
 
550.6%
 
870.8%
 
ValueCountFrequency (%) 
870.8%
 
550.6%
 
4182.0%
 
3161.8%
 
2283.1%
 
120923.5%
 
060868.2%
 

Parch
Real number (ℝ≥0)

ZEROS

Distinct count7
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.38159371492704824
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
067876.1%
 
111813.2%
 
2809.0%
 
550.6%
 
350.6%
 
440.4%
 
610.1%
 
ValueCountFrequency (%) 
067876.1%
 
111813.2%
 
2809.0%
 
350.6%
 
440.4%
 
550.6%
 
610.1%
 
ValueCountFrequency (%) 
610.1%
 
550.6%
 
440.4%
 
350.6%
 
2809.0%
 
111813.2%
 
067876.1%
 

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count681
Unique (%)76.4%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
1601
 
7
347082
 
7
CA. 2343
 
7
3101295
 
6
CA 2144
 
6
Other values (676)
858
ValueCountFrequency (%) 
160170.8%
 
34708270.8%
 
CA. 234370.8%
 
310129560.7%
 
CA 214460.7%
 
34708860.7%
 
38265250.6%
 
S.O.C. 1487950.6%
 
34707740.4%
 
11378140.4%
 
266640.4%
 
11376040.4%
 
PC 1775740.4%
 
1995040.4%
 
W./C. 660840.4%
 
34990940.4%
 
LINE40.4%
 
1742140.4%
 
413340.4%
 
11015230.3%
 
23008030.3%
 
SC/Paris 212330.3%
 
34577330.3%
 
2910630.3%
 
36329130.3%
 
Other values (656)78087.5%
 

Length

Max length18
Median length6
Mean length6.750841751
Min length3

Overview of Unicode Properties

Unique unicode characters35
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
374612.4%
 
168911.5%
 
25949.9%
 
74908.1%
 
44647.7%
 
64227.0%
 
04066.7%
 
53876.4%
 
93285.5%
 
82824.7%
 
2394.0%
 
.1973.3%
 
C1512.5%
 
O1001.7%
 
/981.6%
 
P981.6%
 
A821.4%
 
S741.2%
 
N400.7%
 
T360.6%
 
W160.3%
 
Q150.2%
 
I110.2%
 
E70.1%
 
R70.1%
 
Other values (10)360.6%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number480879.9%
 
Uppercase Letter65210.8%
 
Other Punctuation2954.9%
 
Space Separator2394.0%
 
Lowercase Letter210.3%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C15123.2%
 
O10015.3%
 
P9815.0%
 
A8212.6%
 
S7411.3%
 
N406.1%
 
T365.5%
 
W162.5%
 
Q152.3%
 
I111.7%
 
E71.1%
 
R71.1%
 
F71.1%
 
L40.6%
 
H30.5%
 
B10.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.19766.8%
 
/9833.2%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
374615.5%
 
168914.3%
 
259412.4%
 
749010.2%
 
44649.7%
 
64228.8%
 
04068.4%
 
53878.0%
 
93286.8%
 
82825.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
239100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a628.6%
 
s523.8%
 
r419.0%
 
i419.0%
 
l14.8%
 
e14.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common534288.8%
 
Latin67311.2%
 

Most frequent Latin characters

ValueCountFrequency (%) 
C15122.4%
 
O10014.9%
 
P9814.6%
 
A8212.2%
 
S7411.0%
 
N405.9%
 
T365.3%
 
W162.4%
 
Q152.2%
 
I111.6%
 
E71.0%
 
R71.0%
 
F71.0%
 
a60.9%
 
s50.7%
 
r40.6%
 
i40.6%
 
L40.6%
 
H30.4%
 
B10.1%
 
l10.1%
 
e10.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
374614.0%
 
168912.9%
 
259411.1%
 
74909.2%
 
44648.7%
 
64227.9%
 
04067.6%
 
53877.2%
 
93286.1%
 
82825.3%
 
2394.5%
 
.1973.7%
 
/981.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6015100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
374612.4%
 
168911.5%
 
25949.9%
 
74908.1%
 
44647.7%
 
64227.0%
 
04066.7%
 
53876.4%
 
93285.5%
 
82824.7%
 
2394.0%
 
.1973.3%
 
C1512.5%
 
O1001.7%
 
/981.6%
 
P981.6%
 
A821.4%
 
S741.2%
 
N400.7%
 
T360.6%
 
W160.3%
 
Q150.2%
 
I110.2%
 
E70.1%
 
R70.1%
 
Other values (10)360.6%
 

Fare
Real number (ℝ≥0)

ZEROS

Distinct count248
Unique (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.204207968574636
Minimum0.0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8.05434.8%
 
13424.7%
 
7.8958384.3%
 
7.75343.8%
 
26313.5%
 
10.5242.7%
 
7.925182.0%
 
7.775161.8%
 
26.55151.7%
 
0151.7%
 
7.2292151.7%
 
7.8542131.5%
 
8.6625131.5%
 
7.25131.5%
 
7.225121.3%
 
16.191.0%
 
9.591.0%
 
24.1580.9%
 
15.580.9%
 
56.495870.8%
 
5270.8%
 
14.570.8%
 
14.454270.8%
 
69.5570.8%
 
7.0570.8%
 
Other values (223)47353.1%
 
ValueCountFrequency (%) 
0151.7%
 
4.012510.1%
 
510.1%
 
6.237510.1%
 
6.437510.1%
 
6.4510.1%
 
6.495820.2%
 
6.7520.2%
 
6.858310.1%
 
6.9510.1%
 
ValueCountFrequency (%) 
512.329230.3%
 
26340.4%
 
262.37520.2%
 
247.520820.2%
 
227.52540.4%
 
221.779210.1%
 
211.510.1%
 
211.337530.3%
 
164.866720.2%
 
153.462530.3%
 

Cabin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct count147
Unique (%)72.1%
Missing687
Missing (%)77.1%
Memory size7.1 KiB
C23 C25 C27
 
4
B96 B98
 
4
G6
 
4
F2
 
3
C22 C26
 
3
Other values (142)
186
ValueCountFrequency (%) 
C23 C25 C2740.4%
 
B96 B9840.4%
 
G640.4%
 
F230.3%
 
C22 C2630.3%
 
E10130.3%
 
F3330.3%
 
D30.3%
 
C5220.2%
 
F G7320.2%
 
C6520.2%
 
D3320.2%
 
B57 B59 B63 B6620.2%
 
E820.2%
 
C9220.2%
 
B520.2%
 
C9320.2%
 
D3620.2%
 
B51 B53 B5520.2%
 
D2620.2%
 
C12620.2%
 
B58 B6020.2%
 
C12320.2%
 
B7720.2%
 
E2420.2%
 
Other values (122)14316.0%
 
(Missing)68777.1%
 

Length

Max length15
Median length3
Mean length3.134680135
Min length1

Overview of Unicode Properties

Unique unicode characters21
Unique unicode categories (?)4
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n137449.2%
 
a68724.6%
 
2722.6%
 
C712.5%
 
B642.3%
 
1612.2%
 
3592.1%
 
6511.8%
 
5451.6%
 
8371.3%
 
4371.3%
 
D341.2%
 
341.2%
 
7341.2%
 
E331.2%
 
9331.2%
 
0311.1%
 
A150.5%
 
F130.5%
 
G70.3%
 
T1< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter206173.8%
 
Decimal Number46016.5%
 
Uppercase Letter2388.5%
 
Space Separator341.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n137466.7%
 
a68733.3%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C7129.8%
 
B6426.9%
 
D3414.3%
 
E3313.9%
 
A156.3%
 
F135.5%
 
G72.9%
 
T10.4%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
27215.7%
 
16113.3%
 
35912.8%
 
65111.1%
 
5459.8%
 
8378.0%
 
4378.0%
 
7347.4%
 
9337.2%
 
0316.7%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
34100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin229982.3%
 
Common49417.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n137459.8%
 
a68729.9%
 
C713.1%
 
B642.8%
 
D341.5%
 
E331.4%
 
A150.7%
 
F130.6%
 
G70.3%
 
T1< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
27214.6%
 
16112.3%
 
35911.9%
 
65110.3%
 
5459.1%
 
8377.5%
 
4377.5%
 
346.9%
 
7346.9%
 
9336.7%
 
0316.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2793100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n137449.2%
 
a68724.6%
 
2722.6%
 
C712.5%
 
B642.3%
 
1612.2%
 
3592.1%
 
6511.8%
 
5451.6%
 
8371.3%
 
4371.3%
 
D341.2%
 
341.2%
 
7341.2%
 
E331.2%
 
9331.2%
 
0311.1%
 
A150.5%
 
F130.5%
 
G70.3%
 
T1< 0.1%
 

Embarked
Categorical

Distinct count3
Unique (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
S
644
C
168
Q
 
77
ValueCountFrequency (%) 
S64472.3%
 
C16818.9%
 
Q778.6%
 
(Missing)20.2%
 

Length

Max length3
Median length1
Mean length1.004489338
Min length1

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
S64472.0%
 
C16818.8%
 
Q778.6%
 
n40.4%
 
a20.2%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter88999.3%
 
Lowercase Letter60.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S64472.4%
 
C16818.9%
 
Q778.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n466.7%
 
a233.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin895100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
S64472.0%
 
C16818.8%
 
Q778.6%
 
n40.4%
 
a20.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII895100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
S64472.0%
 
C16818.8%
 
Q778.6%
 
n40.4%
 
a20.2%
 

Relatives
Real number (ℝ≥0)

ZEROS

Distinct count9
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9046015712682379
Minimum0
Maximum10
Zeros537
Zeros (%)60.3%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile5
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.613458541
Coefficient of variation (CV)1.783612358
Kurtosis9.15966597
Mean0.9046015713
Median Absolute Deviation (MAD)0
Skewness2.727441474
Sum806
Variance2.603248465
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
053760.3%
 
116118.1%
 
210211.4%
 
3293.3%
 
5222.5%
 
4151.7%
 
6121.3%
 
1070.8%
 
760.7%
 
ValueCountFrequency (%) 
053760.3%
 
116118.1%
 
210211.4%
 
3293.3%
 
4151.7%
 
5222.5%
 
6121.3%
 
760.7%
 
1070.8%
 
ValueCountFrequency (%) 
1070.8%
 
760.7%
 
6121.3%
 
5222.5%
 
4151.7%
 
3293.3%
 
210211.4%
 
116118.1%
 
053760.3%
 

AgeRange
Categorical

Distinct count5
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
adulto
590
nada
177
criança
 
62
adolescente
 
51
idoso
 
11
ValueCountFrequency (%) 
adulto59066.2%
 
nada17719.9%
 
criança627.0%
 
adolescente515.7%
 
idoso111.2%
 

Length

Max length11
Median length6
Mean length5.946127946
Min length4

Overview of Unicode Properties

Unique unicode characters13
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)2
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a111921.1%
 
d82915.6%
 
o66312.5%
 
l64112.1%
 
t64112.1%
 
u59011.1%
 
n2905.5%
 
e1532.9%
 
c1132.1%
 
i731.4%
 
r621.2%
 
ç621.2%
 
s621.2%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter5298100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a111921.1%
 
d82915.6%
 
o66312.5%
 
l64112.1%
 
t64112.1%
 
u59011.1%
 
n2905.5%
 
e1532.9%
 
c1132.1%
 
i731.4%
 
r621.2%
 
ç621.2%
 
s621.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin5298100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a111921.1%
 
d82915.6%
 
o66312.5%
 
l64112.1%
 
t64112.1%
 
u59011.1%
 
n2905.5%
 
e1532.9%
 
c1132.1%
 
i731.4%
 
r621.2%
 
ç621.2%
 
s621.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII523698.8%
 
None621.2%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a111921.4%
 
d82915.8%
 
o66312.7%
 
l64112.2%
 
t64112.2%
 
u59011.3%
 
n2905.5%
 
e1532.9%
 
c1132.2%
 
i731.4%
 
r621.2%
 
s621.2%
 

Most frequent None characters

ValueCountFrequency (%) 
ç62100.0%
 

AgeNotNull
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count88
Unique (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911764705882
Minimum0.42
Maximum80.0
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24303.4%
 
22273.0%
 
18262.9%
 
28252.8%
 
19252.8%
 
30252.8%
 
21242.7%
 
25232.6%
 
36222.5%
 
29202.2%
 
32182.0%
 
26182.0%
 
35182.0%
 
27182.0%
 
16171.9%
 
31171.9%
 
34151.7%
 
23151.7%
 
33151.7%
 
20151.7%
 
39141.6%
 
17131.5%
 
42131.5%
 
40131.5%
 
45121.3%
 
Other values (63)23626.5%
 
(Missing)17719.9%
 
ValueCountFrequency (%) 
0.4210.1%
 
0.6710.1%
 
0.7520.2%
 
0.8320.2%
 
0.9210.1%
 
170.8%
 
2101.1%
 
360.7%
 
4101.1%
 
540.4%
 
ValueCountFrequency (%) 
8010.1%
 
7410.1%
 
7120.2%
 
70.510.1%
 
7020.2%
 
6610.1%
 
6530.3%
 
6420.2%
 
6320.2%
 
6240.4%
 

AgeFillNa-1
Real number (ℝ)

HIGH CORRELATION

Distinct count89
Unique (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.600639730639728
Minimum-1.0
Maximum80.0
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum-1
5-th percentile-1
Q16
median24
Q335
95-th percentile54
Maximum80
Range81
Interquartile range (IQR)29

Descriptive statistics

Standard deviation17.86749639
Coefficient of variation (CV)0.7570767823
Kurtosis-0.562398231
Mean23.60063973
Median Absolute Deviation (MAD)12
Skewness0.2225878406
Sum21028.17
Variance319.2474271
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-117719.9%
 
24303.4%
 
22273.0%
 
18262.9%
 
28252.8%
 
19252.8%
 
30252.8%
 
21242.7%
 
25232.6%
 
36222.5%
 
29202.2%
 
32182.0%
 
26182.0%
 
35182.0%
 
27182.0%
 
16171.9%
 
31171.9%
 
34151.7%
 
23151.7%
 
33151.7%
 
20151.7%
 
39141.6%
 
17131.5%
 
42131.5%
 
40131.5%
 
Other values (64)24827.8%
 
ValueCountFrequency (%) 
-117719.9%
 
0.4210.1%
 
0.6710.1%
 
0.7520.2%
 
0.8320.2%
 
0.9210.1%
 
170.8%
 
2101.1%
 
360.7%
 
4101.1%
 
ValueCountFrequency (%) 
8010.1%
 
7410.1%
 
7120.2%
 
70.510.1%
 
7020.2%
 
6610.1%
 
6530.3%
 
6420.2%
 
6320.2%
 
6240.4%
 

AgeFillNaSexMean
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count90
Unique (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.736034227171306
Minimum0.42
Maximum80.0
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0.42
5-th percentile6
Q122
median30
Q335
95-th percentile54
Maximum80
Range79.58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation13.01489688
Coefficient of variation (CV)0.4376809893
Kurtosis0.945698785
Mean29.73603423
Median Absolute Deviation (MAD)6
Skewness0.4245857764
Sum26494.8065
Variance169.3875408
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
30.7266445912413.9%
 
27.91570881535.9%
 
24303.4%
 
22273.0%
 
18262.9%
 
28252.8%
 
19252.8%
 
30252.8%
 
21242.7%
 
25232.6%
 
36222.5%
 
29202.2%
 
27182.0%
 
35182.0%
 
26182.0%
 
32182.0%
 
16171.9%
 
31171.9%
 
33151.7%
 
20151.7%
 
23151.7%
 
34151.7%
 
39141.6%
 
40131.5%
 
42131.5%
 
Other values (65)26129.3%
 
ValueCountFrequency (%) 
0.4210.1%
 
0.6710.1%
 
0.7520.2%
 
0.8320.2%
 
0.9210.1%
 
170.8%
 
2101.1%
 
360.7%
 
4101.1%
 
540.4%
 
ValueCountFrequency (%) 
8010.1%
 
7410.1%
 
7120.2%
 
70.510.1%
 
7020.2%
 
6610.1%
 
6530.3%
 
6420.2%
 
6320.2%
 
6240.4%
 

InverseAge
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct count88
Unique (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean-29.69911764705882
Minimum-80.0
Maximum-0.42
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum-80
5-th percentile-56
Q1-38
median-28
Q3-20.125
95-th percentile-4
Maximum-0.42
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)-0.4891221855
Kurtosis0.1782741536
Mean-29.69911765
Median Absolute Deviation (MAD)9
Skewness-0.3891077823
Sum-21205.17
Variance211.0191247
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-24303.4%
 
-22273.0%
 
-18262.9%
 
-28252.8%
 
-19252.8%
 
-30252.8%
 
-21242.7%
 
-25232.6%
 
-36222.5%
 
-29202.2%
 
-32182.0%
 
-26182.0%
 
-35182.0%
 
-27182.0%
 
-16171.9%
 
-31171.9%
 
-34151.7%
 
-23151.7%
 
-33151.7%
 
-20151.7%
 
-39141.6%
 
-17131.5%
 
-42131.5%
 
-40131.5%
 
-45121.3%
 
Other values (63)23626.5%
 
(Missing)17719.9%
 
ValueCountFrequency (%) 
-8010.1%
 
-7410.1%
 
-7120.2%
 
-70.510.1%
 
-7020.2%
 
-6610.1%
 
-6530.3%
 
-6420.2%
 
-6320.2%
 
-6240.4%
 
ValueCountFrequency (%) 
-0.4210.1%
 
-0.6710.1%
 
-0.7520.2%
 
-0.8320.2%
 
-0.9210.1%
 
-170.8%
 
-2101.1%
 
-360.7%
 
-4101.1%
 
-540.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedRelativesAgeRangeAgeNotNullAgeFillNa-1AgeFillNaSexMeanInverseAge
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS1adulto22.022.022.000000-22.0
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C1adulto38.038.038.000000-38.0
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS0adulto26.026.026.000000-26.0
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S1adulto35.035.035.000000-35.0
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS0adulto35.035.035.000000-35.0
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ0nadaNaN-1.030.726645NaN
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S0adulto54.054.054.000000-54.0
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS4criança2.02.02.000000-2.0
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS2adulto27.027.027.000000-27.0
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC1adolescente14.014.014.000000-14.0

Last rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedRelativesAgeRangeAgeNotNullAgeFillNa-1AgeFillNaSexMeanInverseAge
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS0adulto33.033.033.000000-33.0
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS0adulto22.022.022.000000-22.0
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS0adulto28.028.028.000000-28.0
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS0adulto25.025.025.000000-25.0
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ5adulto39.039.039.000000-39.0
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS0adulto27.027.027.000000-27.0
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S0adulto19.019.019.000000-19.0
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS3nadaNaN-1.027.915709NaN
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C0adulto26.026.026.000000-26.0
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ0adulto32.032.032.000000-32.0